The sample base rate is the base rate or prevalance in the sample data used in an experiment or training data for machine learning. It may not be the same as the overall population base rate, especially if there is some form of sampling bias. Sometimes it is deliberately different, for example when using a stratified sample to ensure coverage of minority groups, in such cases it may be appropriate to correct estimates using weightings based on the population base rate.
It is very important to ensure that the base rate is reported in any study outcomes or in the documentation of machine leaning models as this may differ from the presentation base rate in the final context of use.
Used in Chap. 14: page 180
Also used in hcistats2e: Chap. 7: page 81
Used in glossary entries: base rate, machine learning, population base rate, presentation base rate, sampling bias, stratified sample
